9 research outputs found

    CCG-augmented hierarchical phrase-based statistical machine translation

    Get PDF
    Augmenting Statistical Machine Translation (SMT) systems with syntactic information aims at improving translation quality. Hierarchical Phrase-Based (HPB) SMT takes a step toward incorporating syntax in Phrase-Based (PB) SMT by modelling one aspect of language syntax, namely the hierarchical structure of phrases. Syntax Augmented Machine Translation (SAMT) further incorporates syntactic information extracted using context free phrase structure grammar (CF-PSG) in the HPB SMT model. One of the main challenges facing CF-PSG-based augmentation approaches for SMT systems emerges from the difference in the definition of the constituent in CF-PSG and the ‘phrase’ in SMT systems, which hinders the ability of CF-PSG to express the syntactic function of many SMT phrases. Although the SAMT approach to solving this problem using ‘CCG-like’ operators to combine constituent labels improves syntactic constraint coverage, it significantly increases their sparsity, which restricts translation and negatively affects its quality. In this thesis, we address the problems of sparsity and limited coverage of syntactic constraints facing the CF-PSG-based syntax augmentation approaches for HPB SMT using Combinatory Cateogiral Grammar (CCG). We demonstrate that CCG’s flexible structures and rich syntactic descriptors help to extract richer, more expressive and less sparse syntactic constraints with better coverage than CF-PSG, which enables our CCG-augmented HPB system to outperform the SAMT system. We also try to soften the syntactic constraints imposed by CCG category nonterminal labels by extracting less fine-grained CCG-based labels. We demonstrate that CCG label simplification helps to significantly improve the performance of our CCG category HPB system. Finally, we identify the factors which limit the coverage of the syntactic constraints in our CCG-augmented HPB model. We then try to tackle these factors by extending the definition of the nonterminal label to be composed of a sequence of CCG categories and augmenting the glue grammar with CCG combinatory rules. We demonstrate that our extension approaches help to significantly increase the scope of the syntactic constraints applied in our CCG-augmented HPB model and achieve significant improvements over the HPB SMT baseline

    CCG contextual labels in hierarchical phrase-based SMT

    Get PDF
    In this paper, we present a method to employ target-side syntactic contextual information in a Hierarchical Phrase-Based system. Our method uses Combinatory Categorial Grammar (CCG) to annotate training data with labels that represent the left and right syntactic context of target-side phrases. These labels are then used to assign labels to nonterminals in hierarchical rules. CCG-based contextual labels help to produce more grammatical translations by forcing phrases which replace nonterminals during translations to comply with the contextual constraints imposed by the labels. We present experiments which examine the performance of CCG contextual labels on Chinese–English and Arabic–English translation in the news and speech expressions domains using different data sizes and CCG-labeling settings. Our experiments show that our CCG contextual labels-based system achieved a 2.42% relative BLEU improvement over a PhraseBased baseline on Arabic–English translation and a 1% relative BLEU improvement over a Hierarchical Phrase-Based system baseline on Chinese–English translation

    Definition of interfaces

    Get PDF
    The aim of this report is to define the interfaces for the tools used in the MT development and evaluation scenarios as included in the QTLaunchPad (QTLP) infrastructure. Specification of the interfaces is important for the interaction and interoperability of the tools in the developed QTLP infrastructure. In addressing this aim, the report provides: 1. Descriptions of the common aspects of the tools and their standardized data formats; 2. Descriptions of the interfaces for the tools for interoperability. where the tools are categorized into preparation, development, and evaluation categories including the human interfaces for quality assessment with multidimensional quality metrics. Interface specifications allow a modular tool infrastructure, flexibly selecting among alternative implementations, enabling realistic expectations to be made at different sections of the QTLP information flow pipeline, and supporting the QTLP infrastructure. D3.2.1 allows the emergence of the QTLP infrastructure and helps the identification and acquisition of existing tools (D4.4.1), the integration of identified language processing tools (D3.3.1), their implementation (D3.4.1), and their testing (D3.5.1). QTLP infrastructure will facilitate the organization and running of the quality translation shared task (D5.2.1). We also provide human interfaces for translation quality assessment with the multidimensional quality metrics (D1.1.1). D3.2.1 is a living document until M12, which is when the identification and acquisition of existing tools (D4.4.1) and the implementation of identified language processing tools (D3.4.1) are due

    MaTrEx: the DCU MT system for NTCIR-8

    Get PDF
    This paper gives the system description of the Dublin City University Machine Translation system MaTrEx for our participation in the translation subtask in the NTCIR-8 Patent Translation Task under under the team ID of DCUMT. Four techniques are deployed in our systems: supertagged PB-SMT, context informed PB-SMT, noise reduction, and system combination. For EN-JP, our system stood second in terms of BLEU reference score among six participants

    A survey of machine translation competences: insights for translation technology educators and practitioners

    No full text
    This paper describes a large-scale survey of machine translation (MT) competencies conducted by a non-commercial and publicly funded European research project. Firstly, we highlight the increased prevalence of translation technologies in the translation and localisation industry, and develop upon this by reporting on survey data derived from 438 validated respondents, including freelance translators, language service providers, translator trainers, and academics. We then focus on ascertaining the prevalence of translation technology usage on a fine-grained scale to address aspects of MT, quality assessment techniques and post-editing. We report a strong need for an improvement in quality assessment methods, tools, and training, partly due to the large variance in approaches and combinations of methods, and to the lack of knowledge and resources. We note the growing uptake of MT and the perceived increase of its prevalence in future workflows. We find that this adoption of MT has led to significant changes in the human translation process, in which post-editing appears to be exclusively used for high-quality content publication. Lastly, we echo the needs of the translation industry and community in an attempt to provide a more comprehensive snapshot to inform the provision of translation training and the need for increased technical competencies

    A survey of machine translation competences: Insights for translation technology educators and practitioners

    No full text
    This paper describes a large-scale survey of machine translation (MT) competencies conducted by a non-commercial and publicly funded European research project. Firstly, we highlight the increased prevalence of translation technologies in the translation and localisation industry, and develop upon this by reporting on survey data derived from 438 validated respondents, including freelance translators, language service providers, translator trainers, and academics. We then focus on ascertaining the prevalence of translation technology usage on a fine-grained scale to address aspects of MT, quality assessment techniques and post-editing. We report a strong need for an improvement in quality assessment methods, tools, and training, partly due to the large variance in approaches and combinations of methods, and to the lack of knowledge and resources. We note the growing uptake of MT and the perceived increase of its prevalence in future workflows. We find that this adoption of MT has led to significant changes in the human translation process, in which post-editing appears to be exclusively used for high-quality content publication. Lastly, we echo the needs of the translation industry and community in an attempt to provide a more comprehensive snapshot to inform the provision of translation training and the need for increased technical competencies
    corecore